Feature selection
See:
Resources
- https://en.wikipedia.org/wiki/Feature_selection
- http://machinelearningmastery.com/an-introduction-to-feature-selection/
- http://scikit-learn.org/stable/modules/feature_selection.html
- Removing features with low variance
- Univariate feature selection
- Recursive feature elimination
Regularization
See AI/Supervised Learning/Regularized regression
Tree-based methods
- https://scikit-learn.org/stable/modules/feature_selection.html#tree-based-feature-selection
- Random forest, extra trees. Feature importances with forests of trees
- XGBoost, Feature importance and why it’s important:
- http://datawhatnow.com/feature-importance/
- http://machinelearningmastery.com/feature-importance-and-feature-selection-with-xgboost-in-python/
- Importance is calculated for a single decision tree by the amount that each attribute split point improves the performance measure, weighted by the number of observations the node is responsible for. The performance measure may be the purity (Gini index) used to select the split points or another more specific error function. The feature importances are then averaged across all of the the decision trees within the model.
Books
Code
- #CODE Scikit-feature
- #CODE Feature-selector - Feature selector is a tool for dimensionality reduction of machine learning datasets.
- Methods: Missing Values, Single Unique Values, Collinear Features, Zero Importance Features, Low Importance Features
- https://github.com/WillKoehrsen/feature-selector/blob/master/Feature Selector Usage.ipynb
- https://towardsdatascience.com/a-feature-selection-tool-for-machine-learning-in-python-b64dd23710f0
- #CODE ITMO_FS
- Feature selection library in python
- https://itmo-fs.readthedocs.io/en/latest/